Czech Audio-Visual Speech Synthesis with an HMM-trained Speech Database and Enhanced Coarticulation

نویسنده

  • MILOS ZELEZNY
چکیده

The task of visual speech synthesis is usually solved by concatenation of basic speech units selected from a visual speech database. Acoustical part is carried out separately using similar method. There are two main problems in this process. The first problem is a design of a database, that means estimation of the database parameters for all basic speech units. Second problem is a way how to concatenate selected basic phonetic units so as to eliminate the coarticulation effect. Both problems are aimed in our work, resulting in the Czech audio-visual speech synthesizer. We use HMM training process instead of some form of averaging for obtaining statistically best parameters for all basic phonetic units. For solution of a coarticulation effect we use the method of dominance functions. This paper presents the new Czech audio-visual synthesis. The designed talking head provides now intelligible speech and is ready for further work on the adopting real head look, which will incorporate model adaptation to a real head shapes and applying a texture. Key-Words: audio-visual speech synthesis, hidden Markov models, coarticulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

INTERSPEECH 2006 1 sing Dominance Functions and udio - Visual Speech Synthesis

This paper presents results of training of coarticulation models for Czech audio-visual speech synthesis. Two approaches for solution of coarticulation in audio-visual speech synthesis were used, coarticulation based on dominance functions and visual unit selection. For both approaches, coarticulation models were trained. Models for unit selection approach were trained by visualy clustered data...

متن کامل

HMM-based visual speech synthesis using dynamic visemes

In this paper we incorporate dynamic visemes into hidden Markov model (HMM)-based visual speech synthesis. Dynamic visemes represent intuitive visual gestures identified automatically by clustering purely visual speech parameters. They have the advantage of spanning multiple phones and so they capture the effects of visual coarticulation explicitly within the unit. The previous application of d...

متن کامل

Using HMMs and ANNs for mapping acoustic to visual speech

In this paper we present two different methods for mapping auditory, telephone quality speech to visual parameter trajectories, specifying the movements of an animated synthetic face. In the first method, Hidden Markov Models (HMMs) where used to obtain phoneme strings and time labels. These where then transformed by rules into parameter trajectories for visual speech synthesis. In the second m...

متن کامل

Text-to-visual speech synthesis based on parameter generation from HMM

This paper presents a new technique for synthesizing visual speech from arbitrarily given text. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. In the training phase, syllable HMMs are trained with visual speech parameter sequences that represent lip movements. In the synthesis phase...

متن کامل

Photo-real lips synthesis with trajectory-guided sample selection

In this paper, we propose an HMM trajectory-guided, real image sample concatenation approach to photo-real talking head synthesis. It renders a smooth and natural video of articulators in sync with given speech signals. An audio-visual database is used to train a statistical Hidden Markov Model (HMM) of lips movement first and the trained model is then used to generate a visual parameter trajec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003